{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bXCInrb_b96Y"
      },
      "source": [
        "# Convert Bounding Box to Masks"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6kN0IgH7cWlq"
      },
      "source": [
        "The goal is to find the mask of an object using the bounding box coordinates. Then use the mask and image to create a COCO format JSON file. It is required to create a dataset for applying an instance segmentation algorithm."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eXwFSgoLeoQL"
      },
      "source": [
        "\n",
        "To find the mask of an object inside an image, a state-of-an-art algorithm called Deep MAC will be used. Input to the [Deep MAC](https://arxiv.org/abs/2104.00613) algorithm will be the normalized bounding box coordinate and an image. Its output will be a mask. Deep MAC pre trained weights trained on a SpineNet backbone will be used to detect the masks. These weights are available in open source. Deep MAC inference script can be [found here](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/deepmac_colab.ipynb) as well but we modified it according to the our project's need."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eCPPN3JOeszb"
      },
      "source": [
        "\n",
        "The output mask and its corresponding image will be then used to create a COCO format JSON annotation file using an open source library known as Imantics."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RHjzxxIUfSgg"
      },
      "source": [
        "## Import libraries \u0026 clone the TF models directory"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4aMXIf3tmuuQ"
      },
      "outputs": [],
      "source": [
        "# install additional libraries\n",
        "!pip install -q tf-models-official\n",
        "!pip3 install -q imantics"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "CXfMBkXvHyjg"
      },
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "\n",
        "import logging\n",
        "logging.disable(logging.WARNING)\n",
        "\n",
        "from matplotlib import pyplot as plt\n",
        "from matplotlib import patches\n",
        "from PIL import Image\n",
        "import numpy as np\n",
        "import random\n",
        "from skimage import color\n",
        "from skimage.color import rgb_colors\n",
        "from skimage import transform\n",
        "from skimage import util\n",
        "import tensorflow as tf\n",
        "import warnings\n",
        "from imantics import Mask, Category, Image as imantics_Image\n",
        "import json\n",
        "tf.compat.v1.enable_eager_execution()\n",
        "\n",
        "\n",
        "COLORS = ([rgb_colors.cyan, rgb_colors.orange, rgb_colors.pink,\n",
        "           rgb_colors.purple, rgb_colors.limegreen , rgb_colors.crimson] +\n",
        "          [(color) for (name, color) in color.color_dict.items()])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dflD6h1vWW4G"
      },
      "source": [
        "## Visualization functions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "dOCWTzvSFnk8"
      },
      "outputs": [],
      "source": [
        "def reframe_box_masks_to_image_masks(box_masks, boxes, image_height,\n",
        "                                     image_width, resize_method='bilinear'):\n",
        "  \"\"\"Transforms the box masks back to full image masks.\n",
        "\n",
        "  Embeds masks in bounding boxes of larger masks whose shapes correspond to\n",
        "  image shape.\n",
        "\n",
        "  Args:\n",
        "    box_masks: A tensor of size [num_masks, mask_height, mask_width].\n",
        "    boxes: A tf.float32 tensor of size [num_masks, 4] containing the box\n",
        "           corners. Row i contains [ymin, xmin, ymax, xmax] of the box\n",
        "           corresponding to mask i. Note that the box corners are in\n",
        "           normalized coordinates.\n",
        "    image_height: Image height. The output mask will have the same height as\n",
        "                  the image height.\n",
        "    image_width: Image width. The output mask will have the same width as the\n",
        "                 image width.\n",
        "    resize_method: The resize method, either 'bilinear' or 'nearest'. Note that\n",
        "      'bilinear' is only respected if box_masks is a float.\n",
        "\n",
        "  Returns:\n",
        "    A tensor of size [num_masks, image_height, image_width] with the same dtype\n",
        "    as `box_masks`.\n",
        "  \"\"\"\n",
        "  resize_method = 'nearest' if box_masks.dtype == tf.uint8 else resize_method\n",
        "  def reframe_box_masks_to_image_masks_default():\n",
        "    \"\"\"The default function when there are more than 0 box masks.\"\"\"\n",
        "\n",
        "    num_boxes = tf.shape(box_masks)[0]\n",
        "    box_masks_expanded = tf.expand_dims(box_masks, axis=3)\n",
        "\n",
        "    resized_crops = tf.image.crop_and_resize(\n",
        "        image=box_masks_expanded,\n",
        "        boxes=reframe_image_corners_relative_to_boxes(boxes),\n",
        "        box_indices=tf.range(num_boxes),\n",
        "        crop_size=[image_height, image_width],\n",
        "        method=resize_method,\n",
        "        extrapolation_value=0)\n",
        "    return tf.cast(resized_crops, box_masks.dtype)\n",
        "\n",
        "  image_masks = tf.cond(\n",
        "      tf.shape(box_masks)[0] \u003e 0,\n",
        "      reframe_box_masks_to_image_masks_default,\n",
        "      lambda: tf.zeros([0, image_height, image_width, 1], box_masks.dtype))\n",
        "  return tf.squeeze(image_masks, axis=3)\n",
        "\n",
        "def reframe_image_corners_relative_to_boxes(boxes):\n",
        "  \"\"\"Reframe the image corners ([0, 0, 1, 1]) to be relative to boxes.\n",
        "\n",
        "  The local coordinate frame of each box is assumed to be relative to\n",
        "  its own for corners.\n",
        "\n",
        "  Args:\n",
        "    boxes: A float tensor of [num_boxes, 4] of (ymin, xmin, ymax, xmax)\n",
        "      coordinates in relative coordinate space of each bounding box.\n",
        "\n",
        "  Returns:\n",
        "    reframed_boxes: Reframes boxes with same shape as input.\n",
        "  \"\"\"\n",
        "  ymin, xmin, ymax, xmax = (boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3])\n",
        "\n",
        "  height = tf.maximum(ymax - ymin, 1e-4)\n",
        "  width = tf.maximum(xmax - xmin, 1e-4)\n",
        "\n",
        "  ymin_out = (0 - ymin) / height\n",
        "  xmin_out = (0 - xmin) / width\n",
        "  ymax_out = (1 - ymin) / height\n",
        "  xmax_out = (1 - xmin) / width\n",
        "  return tf.stack([ymin_out, xmin_out, ymax_out, xmax_out], axis=1)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G-gUJ2qffiiH"
      },
      "source": [
        "## Utility functions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-cWctY5cyUKC"
      },
      "outputs": [],
      "source": [
        "def read_image(path):\n",
        "  \"\"\"Read an image and optionally resize it for better plotting.\"\"\"\n",
        "  with tf.io.gfile.GFile(path, 'rb') as f:\n",
        "    img = Image.open(f)\n",
        "    return np.array(img, dtype=np.uint8)\n",
        "\n",
        "def resize_for_display(image, max_height=600):\n",
        "  height, width, _ = image.shape\n",
        "  width = int(width * max_height / height)\n",
        "  with warnings.catch_warnings():\n",
        "    warnings.simplefilter(\"ignore\", UserWarning)\n",
        "    return util.img_as_ubyte(transform.resize(image, (height, width)))\n",
        "\n",
        "\n",
        "def get_mask_prediction_function(model):\n",
        "  \"\"\"Get single image mask preidction function using a model.\"\"\"\n",
        "\n",
        "  detection_fn = model.signatures['serving_default']\n",
        "\n",
        "\n",
        "  @tf.function\n",
        "  def predict_masks(image, boxes):\n",
        "    height, width, _ = image.shape.as_list()\n",
        "    batch = image[tf.newaxis]\n",
        "    boxes = boxes[tf.newaxis]\n",
        "    detections = detection_fn(images=batch, boxes=boxes)\n",
        "    masks = detections['detection_masks']\n",
        "    return reframe_box_masks_to_image_masks(masks[0], boxes[0],\n",
        "                                                height, width)\n",
        "\n",
        "  return predict_masks\n",
        "\n",
        "\n",
        "def display(im):\n",
        "  plt.figure(figsize=(16, 12))\n",
        "  plt.imshow(im)\n",
        "  plt.show()\n",
        "\n",
        "def plot_image_annotations(image, boxes, masks=None, darken_image=0.7):\n",
        "  fig, ax = plt.subplots(figsize=(16, 12))\n",
        "  ax.set_axis_on()\n",
        "  image = (image * darken_image).astype(np.uint8)\n",
        "  ax.imshow(image)\n",
        "\n",
        "  height, width, _ = image.shape\n",
        "\n",
        "  num_colors = len(COLORS)\n",
        "  color_index = 0\n",
        "  boxes = boxes[:20]\n",
        "\n",
        "  masks_list = masks if masks is not None else [None] * len(boxes)\n",
        "  for box, mask in zip(boxes, masks_list):\n",
        "    ymin, xmin, ymax, xmax = box\n",
        "    ymin *= height\n",
        "    ymax *= height\n",
        "    xmin *= width\n",
        "    xmax *= width\n",
        "\n",
        "    color = COLORS[color_index]\n",
        "    color = np.array(color)\n",
        "    rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,\n",
        "                             linewidth=2.5, edgecolor=color, facecolor='none')\n",
        "    ax.add_patch(rect)\n",
        "\n",
        "    if masks is not None:\n",
        "      mask = (mask \u003e 0.5).astype(np.float32)\n",
        "      color_image = np.ones_like(image) * color[np.newaxis, np.newaxis, :]\n",
        "      color_and_mask = np.concatenate(\n",
        "          [color_image, mask[:, :, np.newaxis]], axis=2)\n",
        "\n",
        "      ax.imshow(color_and_mask, alpha=0.5)\n",
        "\n",
        "    color_index = (color_index + 1) % num_colors\n",
        "\n",
        "  return ax"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1Dn44FimfmId"
      },
      "source": [
        "## Import pre-trained Deep MAC weights"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "A4jhuP2zADfS",
        "outputId": "ae74d5c5-54b8-48af-ea58-197fef886557"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "--2022-08-11 23:49:21--  https://storage.googleapis.com/tf_model_garden/vision/deepmac_maskrcnn/deepmarc_spinenet.zip\n",
            "Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.213.128, 173.194.214.128, 173.194.215.128, ...\n",
            "Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.213.128|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 314902579 (300M) [application/zip]\n",
            "Saving to: ‘deepmarc_spinenet.zip’\n",
            "\n",
            "deepmarc_spinenet.z 100%[===================\u003e] 300.31M   142MB/s    in 2.1s    \n",
            "\n",
            "2022-08-11 23:49:24 (142 MB/s) - ‘deepmarc_spinenet.zip’ saved [314902579/314902579]\n",
            "\n",
            "Archive:  deepmarc_spinenet.zip\n",
            "   creating: deepmarc_spinenet/\n",
            "   creating: deepmarc_spinenet/variables/\n",
            "  inflating: deepmarc_spinenet/variables/variables.data-00000-of-00001  \n",
            "  inflating: deepmarc_spinenet/variables/variables.index  \n",
            "   creating: deepmarc_spinenet/assets/\n",
            "  inflating: deepmarc_spinenet/saved_model.pb  \n"
          ]
        }
      ],
      "source": [
        "!wget https://storage.googleapis.com/tf_model_garden/vision/deepmac_maskrcnn/deepmarc_spinenet.zip\n",
        "!unzip deepmarc_spinenet.zip"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PLLKy18bfzEW"
      },
      "source": [
        "## Load the model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3RGiRcAc7LDU"
      },
      "outputs": [],
      "source": [
        "MODEL = '/content/deepmarc_spinenet/'\n",
        "model = tf.saved_model.load(MODEL)\n",
        "prediction_function = get_mask_prediction_function(model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l8suwOvRf5P5"
      },
      "source": [
        "## MUST CHANGE - Modify the path of an image according to your convenience"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FP8ETRBUFupE",
        "outputId": "4c529603-ea75-480f-a119-90443cd665aa"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
            "                                 Dload  Upload   Total   Spent    Left  Speed\n",
            "100 1235k  100 1235k    0     0  3687k      0 --:--:-- --:--:-- --:--:-- 3676k\n"
          ]
        }
      ],
      "source": [
        "# import an image\n",
        "!curl -O https://raw.githubusercontent.com/tensorflow/models/master/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_3.jpg"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QlzhbuxxysPy"
      },
      "outputs": [],
      "source": [
        "# path to an image\n",
        "IMAGE_PATH = 'image_3.jpg' #@param {type:\"string\"}\n",
        "\n",
        "# list of bounding box coordinates in the ymin, xmin, ymax, xmax format\n",
        "BB_CORD = [175.0, 815.06625, 948.0, 1630.125] #@param {type:\"raw\"}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "cL-IU_LN0ycU",
        "outputId": "5595008a-833d-4d72-f534-6dbfedac4806"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "2048 2592\n",
            "0.08544921875 0.3144545717592592 0.462890625 0.62890625\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "array([[0.08544922, 0.31445457, 0.46289062, 0.62890625]])"
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# get height and width of an image\n",
        "im = read_image(IMAGE_PATH)\n",
        "height, width, _ = im.shape\n",
        "print(height, width)\n",
        "\n",
        "# convert bounding box coordinates to normalized coordinates\n",
        "YMIN, XMIN, YMAX, XMAX = BB_CORD[0], BB_CORD[1], BB_CORD[2], BB_CORD[3]\n",
        "YMIN_NOR, XMIN_NOR, YMAX_NOR, XMAX_NOR = YMIN/height, XMIN/width, YMAX/height, XMAX/width\n",
        "print(YMIN_NOR, XMIN_NOR, YMAX_NOR, XMAX_NOR)\n",
        "\n",
        "# reshape the coordinates\n",
        "boxes = np.array([YMIN_NOR, XMIN_NOR, YMAX_NOR, XMAX_NOR]).reshape(1,4)\n",
        "boxes"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wo1ogZ_r5w1-"
      },
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "# display bounding box over an image\n",
        "plot_image_annotations(im, boxes)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lzhbOoyWgJSO"
      },
      "source": [
        "## Doing the inference and showing the results"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "m6Yyhh6irf1n"
      },
      "outputs": [],
      "source": [
        "masks = prediction_function(tf.convert_to_tensor(im),\n",
        "                            tf.convert_to_tensor(boxes, dtype=tf.float32))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Uq6vbiRirlLB"
      },
      "outputs": [],
      "source": [
        "plot_image_annotations(im, boxes, masks.numpy())\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WcCAReeIgPBt"
      },
      "source": [
        "## Get the mask"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "nTkvNK6xrzm5"
      },
      "outputs": [],
      "source": [
        "mask = masks[0].numpy().reshape(im.shape[0], im.shape[1])\n",
        "mask = np.where(mask \u003e 0.90, 1, 0)\n",
        "mask = np.array(mask, dtype=np.uint8)\n",
        "display(mask)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gWp0pMCRgRsc"
      },
      "source": [
        "# Convert Mask \u0026 Image to COCO JSON\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "g-7HTpM3sLbs",
        "outputId": "00ffe574-644f-4c90-87b0-be89e682132b"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "64187"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# use of imantics library\n",
        "\n",
        "# give the path of an image\n",
        "image = imantics_Image.from_path(IMAGE_PATH)\n",
        "\n",
        "# array of the mask\n",
        "mask = Mask(mask)\n",
        "\n",
        "# define the category of an object\n",
        "image.add(mask, category=Category(\"Category Name\"))\n",
        "\n",
        "# create a dict of coco\n",
        "coco_json = image.export(style='coco')\n",
        "coco_json.keys()\n",
        "\n",
        "# write coco_json dict to coco.json\n",
        "open('coco.json', \"w\").write(json.dumps(coco_json, indent=4))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4t8BD-isKTK7",
        "outputId": "01722bbb-84e1-41f5-f0b9-1d3af20e9d6d"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[{'color': '#1fab35',\n",
              "  'id': 1,\n",
              "  'metadata': {},\n",
              "  'name': 'Category Name',\n",
              "  'supercategory': None}]"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# display the categories\n",
        "coco_json['categories']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "36i8AEKzKabO",
        "outputId": "5eac0496-1ec3-4a53-ee6e-717e673e49d6"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[{'coco_url': None,\n",
              "  'date_captured': None,\n",
              "  'fickr_url': None,\n",
              "  'file_name': 'image_3.jpg',\n",
              "  'height': 2048,\n",
              "  'id': 0,\n",
              "  'license': None,\n",
              "  'metadata': {},\n",
              "  'path': 'image_3.jpg',\n",
              "  'width': 2592}]"
            ]
          },
          "execution_count": 16,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# display image information\n",
        "coco_json['images']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FzQMbZqGKfH8",
        "outputId": "d1686f0f-5053-4b73-8353-9cc590635abc"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "(833, 188, 784, 747)"
            ]
          },
          "execution_count": 17,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# display bounding box\n",
        "coco_json['annotations'][0]['bbox']"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "bb_to_mask_to_coco.ipynb",
      "provenance": []
    },
    "gpuClass": "standard",
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
